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We study the graph constructed on a Poisson point process in d dimensions by connecting each 
point to the k points nearest to it. This graph a.s. has an infinite cluster if k > k c {d) where k c {d), 
known as the critical value, depends only on the dimension d. This paper presents an improved 
upper bound of 188 on the value of k c (2). We also show that if k > 188 the infinite cluster of 
NN(2, k) has an infinite subset of points with the property that the distance along the edges of the 
graphs between these points is at most a constant multiplicative factor larger than their Euclidean 
distance. Finally we discuss in detail the relevance of our results to the study of multi-hop wireless 
sensor networks. 



1 Introduction 

> 

tJ^J- ' The /c-nearest neighbour graph of a point set S in a metric space is constructed according to the 

following natural definition: For each point x £ S establish an edge from x to the k points of S \ {x} 
nearest to it. Such graphs have applications in numerous areas: classification problems of all flavours, 
topology control in wireless networks [HI [22], data compression [13 [l] and dimensionality reduction [19] 

q ■ and multi-agent systems [TO] . 

We focus on fc-nearest neighbor graphs on random point sets in R assuming that the distance 
is the Euclidean distance. Further we restrict ourselves to the case where the edges established are 
undirected. Clearly it is not necessary that this graph be connected for arbitrary k and S or even that 

S^ . it have a large connected component. However, Haggstrom and Meester [13j have shown that if the 
set S is generated by a Poisson point process then there is a finite value k c (d) depending only on the 
dimension such that if k > k c (d), the /c-nearest neighbor graph has an connected component which is 
infinite. In this paper we study this setting further. Following the notation in [13] we will denote this 
model in d dimensions, parametrized by k as NN(d, k). 

In this paper we show that for NN(2, k) that if k > 188 the infinite clustein has an infinite subset 
of points with the property the metric distortion between them is bounded by a constant i.e. if there 
is a pair of points in this infinite subset the shortest distance between them achieved along a path in 
the graph is at most a constant multiplicative factor larger than the Euclidean distance between them. 
In the process of proving the latter result we improve the best known bound for k c {2) to 188 from 213 
(due to Teng and Yao [20]). Our proof technique generalizes easily to NN(d, A;) for d > 3. 
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x We will use the terms component and cluster interchangeably. 



Organization. The rest of this section is devoted to surveying related work and introducing the 
terms and notation we will use. A new bound on k c (2) and the result on the metric distortion within the 
infinite cluster is presented in Section [2j We will talk about the applicability of our results to wireless 
multi-hop sensor networks in Section [3] concluding with a discussion of some simulation results and 
conjectures arising from them in Section [4) 

1.1 Related work 

The study of random graphs obtained by applying connection rules on stationary point processes is 
known as continuum percolation. Meester and Roy's monograph on the subject provides an excellent 
view of the deep theory that has been developed around this general setting [16]. The NN(d, k) model 
was introduced by Haggstrom and Meester [13] . They showed that there was a finite critical value, 
k c (d) for all d > 2 such that an infinite cluster exists in this model. They proved that the infinite 
cluster was unique and that there was a value do such that k c (d) = 2 for all d > do. Teng and Yao gave 
an upper bound of 213 for k c (d) [20J. 

/c-nearest neighbor graphs on random point sets contained inside a finite region have been extensively 
studied. The major concern, different from ours, has been to ensure that all the points within the region 
are connected within the same cluster. Ballister, Bollobas, Sarkar and Walters [5] showed that the 
smallest value of k that will ensure connectivity lies between 0.3043 log n and 0.5139 log n, improving 
earlier results of Xue and Kumar [22]. Ballister et. al. also studied the problem of covering the region 
with the discs containing the fc-nearest neighbours of the points. We refer the reader to [5] for an 
interesting discussion relating this setting to earlier work by Penrose and others. 

Eppstein, Paterson and Yao [9] studied /c-nearest neighbour graphs on random point sets in two 
dimensions in some detail and proved interesting bounds showing that the number of points in a 
component of depth D was polynomial in D when k was 1 and exponential in D when it was 2 or 
greater. Their primary interest was in obtaining low dilation embeddings of nearest-neighbor graphs. 

Algorithms for searching for nearest neighbors (see e.g. [H[2T]) and constructing nearest neighbor 
graphs efficiently have also received a lot of attention (see e.g. [18]). However these are not directly 
related so we do not survey this literature in detail. 

1.2 Definitions and Notation 

Poisson point processes. Our random point sets are generated by homogenous Poisson point pro- 
cesses of intensity A in M. d where d > 1. Under this model the number of points in a region is a random 
variable that depends only on its d-dimensional volume i.e. the number of points in a bounded, mea- 
surable set A is Poisson distributed with mean XV(A) where V(A) is the d-dimensional volume of A. 
Further, the random variables associated with the number of points in disjoint sets are independent. 
Site percolation. Consider an infinite graph defined on the vertex set Z d with edges between points 
x and y such that \\x — 2/ 1 1 1 = 1- Site percolation is a probabilistic process on this graph. Each point 
of Z, d is taken to be open with probability p and closed with probability 1 — p. The product of all 
the measures for individual points forms a measure for the space of possible configurations. An edge 
between two open vertices is considered open. All other edges are considered closed. A component 
in which open vertices are connected through paths of open edges is known as an open cluster. It is 
known that there is a value p c such that for all p > p c the graph obtained has an infinite open cluster. 
This value is known as the critical probability. When p > p c then each point of Z d has some non-zero 
probability of being part of an infinite cluster. The reader is referred to [12] for a full treatment of 
percolation and to [7j for a recent update on some new directions in this area. 



2 An infinite subset of C^ has constant metric distortion 

The graph distance between pairs of points in a /c-nearest neighbor graph is clearly at least the Euclidean 
distance between them. The question arises if the distance is arbitrarily larger than the Euclidean. 
Clearly, for points in different clusters the distance this question makes no sense. We also ignore for 
now the question of what happens inside finite clusters, focussing for now on the infinite cluster in the 
supercritical phase of NN(d, k). We conjecture that it is possible to show that the distance between any 
pair of points in the infinite cluster is only a small factor larger than the Euclidean distance between 
them. In this paper we prove a weaker result: the infinite cluster contains an infinite subset of points 
whose pairwise distances are not distorted by more than a constant factor. In order to do this we 
first present a construction that allows us to couple NN(2, A;) with a site percolation process in Z 2 . 
This construction also improves the best known upper bound for k c (2). Then we show how to use the 
algorithm of Angel et. al. [5] for routing on a percolated mesh to find a short path between a pair of 
vertices in NN(d, k). 

2.1 Coupling NN(2, k) to site percolation in Z 2 

Like Haggstrom and Meester's proof for the existence of a critical value [13] and Teng and Yao's proof 
for the weaker of their two upper bounds on k c (2) [20], we proceed by constructing a coupling with a 
site percolation process on Z 2 . However, our construction gives a better upper bound than Teng and 
Yao's improvement of their own result (also in [20]) to k c {2) > 213 which uses a coupling to a mixed 
percolation process. We are able to improve this result to show k c (2) > 188. Note that both papers, 
the one by Haggstrom and Meester and the one by Teng and Yao, reported that simulations seemed to 
indicate that the value of k c {2) appears to be around 3. Our simulations backed up this finding. Let 
us now proceed to a formal statement of the main theorem of this section and it's proof. 

Theorem 2.1 For the k-nearest neighbour model in a Poisson point process setting 

jfe c (2) < 188. 




Figure 1: A tile t and it's 9 relevant regions. Note that the region E r lies wholly within all discs of the 
form C x and C z centred at points on the boundary of the discs Cq and C r . 



Proof. In order to prove the theorem we couple a site percolation process on Z 2 with the /c-nearest 
neighbour graph as follows. We divide IR 2 into square tiles of side 10a where a is a parameter whose 



value will be fixed later. We create a bijection, </>, between these tiles in M. 2 and points in Z 2 such that 
neighbouring tiles in M? correspond to neighbouring points in Z 2 . We couple the processes by saying 
that a given point x in Z 2 is open only if the tile t = <p~ 1 (x) a certain event At occurs. We now define 
this event At- 

Let us look at a tile centred at (0, 0) with bottom left corner (—5a, —5a) and top right corner 
(5a, 5a). For convenience we will refer to the tiles surrounding the tile t as, couunterclockwise starting 
from the right t r , tt, U and £&. We consider five circles of radius a: Co centred at (0,0), C\ centred at 
(— 4a, 0), C r centred at (4a, 0), Ct centred at (0,4a) and C centred at (0, —4a). There are four other 
region which are named Ei,E r , Et and Ej, in the figure. E r is defined as follows. Consider the largest 
circle centred at any point in C$ or C r that lies wholly within the two tiles t and t r . Two such circles, 
C x and C z , are depicted in Figure [TJ E r is the locus of the points contained in all such circles. The 
regions Ei,Et and E\, are defined similary by Co alongwith Ci, Ct and C respectively and the tiles ti,tt 
and tb respectively. 

Now, for a tile t, the event At is said to occur if 

1. the number of points inside t is at most k/2 and 

2. the nine regions Co, C r , Ct, C C, E r , Et, E\ and Eb contain at least one point each. 

If At occurs we call the point contained in Co the representative 'point of the tile t, denoted rep(i). 
In order to relate the process on Z 2 defined via these events At to the NN(d, k) model, we claim that 
the existence of an edge in Z 2 implies the existence of a path from the representative points of the two 
tiles corresponding to the two end points of the edge. We state this formally, including an observation 
about the metric distortion of the length of the path between the two representative points. 

Claim 2.2 If an edge exists in the percolated mesh Z 2 between two points x and y then 

1. There is a path between the representative points rep((p~ l (x)) and rep(<j)~ l (y)) of the tiles corre- 
sponding to x and y in NN(2, k) and 

2. there is a constant c+;i es such that 

d k {rep((t)~ l {x)), repicf)' 1 (y))) < c • d(rep(^ _1 (x)) , rep(</> _1 (y))). 




Figure 2: A path between two representative points of tiles for both of which the event A t has occured. 



Proof of Claim 12. 2t The proof of the claim is depicted in Figure [2] Clearly any circle drawn from 
rep(i) that stays within t contains all of E r in it by the definition of E r . Since there are at most k/2 



points in every tile for which At has occured, hence there is an edge from rep(t) to the point guaranteed 
to be contained in E r , let's call it x r , by the definition of At- We do not make any claims on where the 
edges established by x r to its neighbours lie, observing only that any point that lies in C r must have 
an edge to x r , again by the definition of E r . However, any disc centred at a point in C r that remains 
within t and t r must contain the left disc of its neighboring tile. Hence, if At and At r occur then a 
path from rep(i) to rep(£ r ) occurs. The second part of the claim is obviously true. The constant can 
easily be calculated using calculus. □ 

From Claim 12.21 h is easy to deduce that if an infinite component exists in the site percolation 
setting, then an infinte component exists in NN(2, A;). Hence we need to determine for what settings 
of our parameters a and, more importantly, k, the site percolation process is supercritical. The critical 
probability for site percolation is 0.59 (see e.g. [15]). Numerical calculations showed that the smallest 
value of k for which the probability of At exceeds this value is 188, and the value of a for which this 
happens is 0.893. □ 

2.2 A subset with constant metric distortion 

We now show that there is a set of points in C^ and constant a such that for each pair of points x, y 
in this set 

D k (x,y) < a- D(x,y). 

We will prove the following theorem 

Theorem 2.3 For NN(2,k) where k > 188, there is a set of points S C Cx> such that \S\ = oo with 
the following property: Let x,y £ S be two points with Euclidean distance D(x,y) between them whose 
k-NN distance is Dj.(x,y). For some a,c depending only on k 

P(D k (x,y)>a-D(x,y))<e- c - D< - x >y\ 

Proof. We identify S to be the set of representative points lying in the infinite cluster of NN(2, A;) 
of the construction described in the proof of Theorem 12.11 as the subset that we will claim has this 
property. We use the coupling with site percolation in Z 2 introduced in that proof to help us find short 
paths between pairs of points in S. 

Let us consider any two tiles t\ and ti whose representative points rep(ii) and rep(i2) he in Coo- 
We denote distance between two points a, b in I? is denoted D^+(a, 6). First we relate the distance 
in the (unpercolated) lattice to the euclidean distance between these two points by observing a simple 
fact. 

Fact 2.4 Given that c+Ai es is the constant defined in Claim [Ol then for two tiles t\,t<i 

c tiles 

When the lattice undergoes percolation, the simple open path from 0(rep(£i)) to </>(rep(i2)) may 
be broken at several points. Antal and Pisztora studied this setting and proved a powerful theorem 
which helps us here [31 Theorems 1.1 and 1.2]. We use it as a lemma here, adopting the restatement 
of Angel et. al. [21 Lemma 8]. 

Lemma 2.5 [31 [2] For any p > p c and any x,y connected through an open path in a cube M d of the 
infinite lattice, let Dj H (x, y) be the distance between the two points in the percolated lattice. For some 
p, c^ > depending only on the dimension and p and for any a > p • Di+Jx, y) 

pr(D P latt (x,y)>a))<e-^ a . 
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Figure 3: The path between two representative points mimics the path in 1/ 



To find a path between rep(ti) and rep(<2) we simply take the path in the percolated lattice between 
4>{t\) and (ftfo) and mimic it M 2 as depicted in Figure [3] and the result follows by combining Fact 12.4 
and Lemma 12.51 □ 



We note here that our claim that the constant in the statement of Theorem 12.31 depends only on 
the value of k follows from the fact that the constants in Lemma 12.51 depend only on p, since in our 
construction the size of a tile and the probability of At occuring for a tile changes when we change k. 
We also note that Antal and Pisztora [3] prove their theorem for bond percolation but note that their 
methods can easily be extended to site percolation. 

It is possible to extend Theorem 12.31 easily for d > 2. The constants change and their dependence 
on d has to be handled carefully but the proof remains basically the same. 



3 Applications to multi-hop wireless sensor networks. 

Multi-hop sensor networks, where nodes act not only to sense but also to relay information, have proven 
advantages in terms of energy efficiency over single hop sensor networks [14J and are useful necessary 
tasks like time synchronization jTT]. And for collaborative tasks like target tracking [23] sensor-to- 
sensor communication is essential. But the total connectivity sought to be achieved in [221 E] between 
all the points of a point process is not necessary for these networks. It may be the correct model for 
general ad hoc wireless networks where all nodes need to be connected, but for a sensor network we 
argue the presence of large connected component is enough. 

Sensor networks seek to achieve coverage of a target area. When the locations of sensors are 
modelled by point processes achieving most coverage measures (whether it is single point coverage or 
fc-coverage or barrier coverage) has found that there is a critical density of the point process above 
which the particular measure is satisfactory. For example [3] estimates the critical density required for 
barrier coverage in strip-like regions, a notion of coverage where an object must be sensed if it tries to 



n 


k 


avg. 


max. value 


percentage 


500 


3 


1.727 


15.180 


96.96% 


500 


4 


1.364 


7.543 


97.96% 


500 


5 


1.204 


5.874 


99.38% 


1000 


3 


1.660 


22.64 


97.08% 


1000 


4 


1.333 


8.39 


98.92% 


1000 


5 


1.172 


4.385 


99.82% 


1500 


4 


1.322 


7.858 


99.12% 


2000 


4 


1.285 


9.512 


99.4% 



Table 1: Metric distortion in NN(2, k). The last column shows the percentage of pairs distorted by a 
factor of 2 or less. 

cross a particular regional 

Our results show that it is possible to find an infinite component with which has an infinite subset 
of nodes whose graph distance is a constant times their euclidean distance. Our construction for the 
proofs of Theorems 12.11 and 12.31 taken along with the fact that for any point in Z 2 there is a non- 
zero probability of being part of the infinite component in the supercritical phase imply the following 
theorem 

Theorem 3.1 For any X, there is a X' such that NN(2,k) built on a point process of density X' with 
k > 188, has an infinite component with the property that an infinite subset of points with density at 
least X has the property that that graph distance between them is at most a constant times the Euclidean 
distance between them. Moreover there is a constant c such that X' < cX. 

Clearly the existence of such a subset can fulfil the sensing requirement while not compromising 
on the sensor-to-sensor data transfer capability. The value 188 seems prohibitive for most practical 
purposes. But it is our hope that this upper bound will be improved down to a reasonable value closer 
to the 2 conjectured by Haggstrom and Meester [13] and Teng and Yao [20] and that it will be possible 
to prove Theorem 12.31 for this improved bound as well. We omit the proof of this theorem here since 
it does not add any new insight over the proofs already seen in this paper. 



4 Conclusion 

We conclude by presenting some conjectures about the relationship of the metric distortion in NN(2, k) 
to the parameter k. These conjectures come from simulations we ran. 

The experiments had to be carried out on a finite box in R 2 , but to negate boundary effects we 
simulated a point process in a large box and looked at the largest component formed within a smaller 
box contained well within this finite box. We placed a number of points randomly within the larger 
area (thereby achieving a target density). In Table [4] the first column has the number of points placed. 
The table shows the average distortion for different values of k, maximum value of the distortion and 
the percentage of points having distortion less than two times the average. This table also indicates 
that there the distortion is independent of the number of points under consideration but depends on 
the value of k. 

To show relationship between k and average distortion we plotted average ratio with k 2 for a range 
of value of k from 3 to 13 for two random point sets. Figures H] and H] show plots for two such sets 



2 See [141 Chap 13.2] for a succinct summary of the issues involved in coverage. 
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Figure 4: Average metric distortion on the y-axis and k 2 on the x axis. The curve plotted is 1 + 
4.62/fc 2 . 
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Figure 5: Average metric distortion on the y-axis and k 2 on the x axis. The curve plotted is 1 + 
5.03/A; 2 . 



along with a function f(k) = 1 + a/k 2 where a is determined by least square fitting functions. These 
findings lead us to conjecture that: 

Conjecture 4.1 For the NN(2, k) model at a value k > k c {2) 

1. The metric distortion of the points of C^ is at most 2 with probability tending to 1 and 

2. there is a constant such that the expected metric distortion of the points of C^ is of the form 
! + £• 
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